Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[action] [PR:16235] [sanity_check][bgp] Add default route check in sanity for single asic #16247

Merged
merged 1 commit into from
Dec 30, 2024

Conversation

mssonicbld
Copy link
Collaborator

Description of PR

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405
  • 202411

Approach

What is the motivation for this PR?

BGP routes would be setup during add-topo https://github.com/sonic-net/sonic-mgmt/blob/master/ansible/roles/vm_set/tasks/add_topo.yml#L276.
But there are some scenarios that route in DUT has been messed up, but bgp sessions are all up, sanity would treat it as healthy and wouldn't take action to recover it.

  1. Loopbackv4 address has been replaced, it would cause all kernel routes from bgp miss
  2. In some test cases announce or withdraw routes from ptf but fail to recover (i.e. test_stress_routes)

Healthy status:

admin@sonic:~$ ip route show default
default nhid 282 proto bgp src 10.1.0.32 metric 20 
 nexthop via 10.0.0.57 dev PortChannel101 weight 1 
 nexthop via 10.0.0.59 dev PortChannel103 weight 1 
 nexthop via 10.0.0.61 dev PortChannel105 weight 1 
 nexthop via 10.0.0.63 dev PortChannel106 weight 1 
admin@sonic:~$ show ip bgp sum

IPv4 Unicast Summary:
BGP router identifier 10.1.0.32, local AS number 65100 vrf-id 0
BGP table version 2890
RIB entries 2893, using 648032 bytes of memory
Peers 6, using 4451856 KiB of memory
Peer groups 4, using 256 bytes of memory


Neighbhor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd NeighborName
----------- --- ----- --------- --------- -------- ----- ------ --------- -------------- --------------
10.0.0.57 4 65200 763 764 0 0 0 11:46:17 1439 ARISTA01M1
10.0.0.59 4 65200 763 765 0 0 0 11:46:17 1439 ARISTA02M1
10.0.0.61 4 65200 763 765 0 0 0 11:46:17 1439 ARISTA03M1
10.0.0.63 4 65200 763 765 0 0 0 11:46:17 1439 ARISTA04M1
10.0.0.65 4 64001 712 761 0 0 0 11:46:15 2 ARISTA01MX
10.0.0.67 4 64002 712 761 0 0 0 11:46:15 2 ARISTA02MX

Total number of neighbors 6

Issue status, no default route, but show ip bgp sum looks good

admin@sonic:~$ ip route show default
admin@sonic:~$ show ip bgp sum

IPv4 Unicast Summary:
BGP router identifier 10.1.0.32, local AS number 65100 vrf-id 0
BGP table version 2892
RIB entries 2893, using 648032 bytes of memory
Peers 6, using 4451856 KiB of memory
Peer groups 4, using 256 bytes of memory


Neighbhor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd NeighborName
----------- --- ----- --------- --------- -------- ----- ------ --------- -------------- --------------
10.0.0.57 4 65200 764 767 0 0 0 11:47:14 1439 ARISTA01M1
10.0.0.59 4 65200 764 768 0 0 0 11:47:14 1439 ARISTA02M1
10.0.0.61 4 65200 764 768 0 0 0 11:47:14 1439 ARISTA03M1
10.0.0.63 4 65200 764 768 0 0 0 11:47:14 1439 ARISTA04M1
10.0.0.65 4 64001 713 764 0 0 0 11:47:12 2 ARISTA01MX
10.0.0.67 4 64002 713 764 0 0 0 11:47:12 2 ARISTA02MX

Total number of neighbors 6

How did you do it?

Add default routes check in sanity check, and re-announce routes if issue happen

How did you verify/test it?

Run sanity check

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

…sonic-net#16235)

What is the motivation for this PR?
BGP routes would be setup during add-topo https://github.com/sonic-net/sonic-mgmt/blob/master/ansible/roles/vm_set/tasks/add_topo.yml#L276.
But there are some scenarios that route in DUT has been messed up, but bgp sessions are all up, sanity would treat it as healthy and wouldn't take action to recover it.

Loopbackv4 address has been replaced, it would cause all kernel routes from bgp miss
In some test cases announce or withdraw routes from ptf but fail to recover (i.e. test_stress_routes)
Healthy status:

admin@sonic:~$ ip route show default
default nhid 282 proto bgp src 10.1.0.32 metric 20 
        nexthop via 10.0.0.57 dev PortChannel101 weight 1 
        nexthop via 10.0.0.59 dev PortChannel103 weight 1 
        nexthop via 10.0.0.61 dev PortChannel105 weight 1 
        nexthop via 10.0.0.63 dev PortChannel106 weight 1 
admin@sonic:~$ show ip bgp sum

IPv4 Unicast Summary:
BGP router identifier 10.1.0.32, local AS number 65100 vrf-id 0
BGP table version 2890
RIB entries 2893, using 648032 bytes of memory
Peers 6, using 4451856 KiB of memory
Peer groups 4, using 256 bytes of memory


Neighbhor      V     AS    MsgRcvd    MsgSent    TblVer    InQ    OutQ  Up/Down      State/PfxRcd  NeighborName
-----------  ---  -----  ---------  ---------  --------  -----  ------  ---------  --------------  --------------
10.0.0.57      4  65200        763        764         0      0       0  11:46:17             1439  ARISTA01M1
10.0.0.59      4  65200        763        765         0      0       0  11:46:17             1439  ARISTA02M1
10.0.0.61      4  65200        763        765         0      0       0  11:46:17             1439  ARISTA03M1
10.0.0.63      4  65200        763        765         0      0       0  11:46:17             1439  ARISTA04M1
10.0.0.65      4  64001        712        761         0      0       0  11:46:15                2  ARISTA01MX
10.0.0.67      4  64002        712        761         0      0       0  11:46:15                2  ARISTA02MX

Total number of neighbors 6
Issue status, no default route, but show ip bgp sum looks good

admin@sonic:~$ ip route show default
admin@sonic:~$ show ip bgp sum

IPv4 Unicast Summary:
BGP router identifier 10.1.0.32, local AS number 65100 vrf-id 0
BGP table version 2892
RIB entries 2893, using 648032 bytes of memory
Peers 6, using 4451856 KiB of memory
Peer groups 4, using 256 bytes of memory


Neighbhor      V     AS    MsgRcvd    MsgSent    TblVer    InQ    OutQ  Up/Down      State/PfxRcd  NeighborName
-----------  ---  -----  ---------  ---------  --------  -----  ------  ---------  --------------  --------------
10.0.0.57      4  65200        764        767         0      0       0  11:47:14             1439  ARISTA01M1
10.0.0.59      4  65200        764        768         0      0       0  11:47:14             1439  ARISTA02M1
10.0.0.61      4  65200        764        768         0      0       0  11:47:14             1439  ARISTA03M1
10.0.0.63      4  65200        764        768         0      0       0  11:47:14             1439  ARISTA04M1
10.0.0.65      4  64001        713        764         0      0       0  11:47:12                2  ARISTA01MX
10.0.0.67      4  64002        713        764         0      0       0  11:47:12                2  ARISTA02MX

Total number of neighbors 6
How did you do it?
Add default routes check in sanity check, and re-announce routes if issue happen

How did you verify/test it?
Run sanity check
@mssonicbld
Copy link
Collaborator Author

/azp run

@mssonicbld
Copy link
Collaborator Author

Original PR: #16235

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator Author

/azp run Azure.sonic-mgmt

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld mssonicbld merged commit ad72224 into sonic-net:202411 Dec 30, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants